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Abstract — In this paper, we consider the problem of robust 
system identification under sparse gross errors. In our problem, 
system parameters are observed through a Toeplitz matrix and 
a few of the observations are corrupted with gross errors. We 
reduce this problem of system identification to a sparse error 
correcting problem using a Toeplitiz structured real-numbered 
coding matrix. We prove the performance guarantee of Toeplitz 
structured matrix in sparse error correction. Thresholds on the 
percentage of correctable errors for Toeplitiz structured matrices 
are also established. 

I. Introduction 

In system identification, an unknown system state x G i?™ 
is often observed through a Toeplitz matrix H S i?"^™, 
namely 

y = ^x, 

where the Toeplitz matrix H is equal to 

h-m+l ■ ■ ■ hi 



m+n+l 



and we assume n > m in this paper 

If there is no interference or noise in the observation y, one 
can then simply recover x from a matrix inversion. However, 
in some cases, a few elements of y can be corrupted with 
large-magnitude gross errors. Such errors can happen with the 
failure of measurement devices, measurement communication 
errors and the interference of adversary parties. In fact, gross 
errors, sometimes called bad data or outliers, have a substantial 
effect on correctly estimating the state. Thus, it is necessary to 
protect the estimates from these gross errors. Research along 
this direction has attracted a significant amount of attention, 
for example, H, [g], H, W\- 

Mathematically, when both additive measurement noise and 
gross errors are present, the observation y can be written as 



y = i/x 



w, 



where e is a sparse gross error vector with fc <C n nonzero 
elements, and w is a measurement noise vector with each 
element being i.i.d. Gaussian random variables. We focus on 
the case where m is fixed, which is often the case in system 
identifications. 



In what follows, we are interested in recovering x in the 
presence of sparse gross errors. This reduces to the now well 
known problem of sparse error correction Q, ||8], ||9], |fT9l . 
While one may exhaustively search for all the (^) possibilities 
of the locations of gross errors, this is of high computational 
cost. Based on Q, (H, it is very natural to solve the following 
convex program: 



mm 

x,z 



subject to 



|y - -ffx - z||i, 

|z||2 < £• 



(I.l) 



In this convex program, z accounts for the additive noise error; 
and the sparse nature of e is approximated by minimizing 
the norm of y — iJx — z. Note that this convex program 
reduces to the standard Basis Pursuit approach when there is 
no additive measurement noise w and z is removed from the 
convex program (II. lb . The used convex program then becomes 

min II y ~ -ffx|| 1. 

X 

In works which discuss sparse error correction problems|7|, 
lEl, in, 112, US, US, HU, each element of the matrix 
H is assumed to i.i.d. random variables following a certain 
distribution, for example, Gaussian distribution and Bernoulli 
distribution. These types of matrices have been shown to 
obey the restricted isometry condition, and ( II.2l l can correctly 
recover x when there are only gross errors present; and 
can recover x approximately when both gross errors and 
measurement noise exist. However, in the system identification 
problem, H has a natural Toepltiz structure and the elements 
of H are correlated. The natural question is whether ( II.2l i 
also provides a performance guarantee in recovering x with a 
Toeplitz matrix structure. We provide a positive answer in this 
paper. To this end and to simplify the analysis, in this paper, 
we focus on the case when only large gross errors are present. 

In this paper, we investigate the performance of Toeplitz 
structured matrices in sparse error correction from the point 
view of high dimensional geometry. We show that like other 
matrices with i.i.d. random variables, Toeplitz structured ma- 
trices also enable the successful recovery of x by using the 
convex optimization ( II. 2b . The main contribution of this paper 
is the establishment of the performance guarantee of Toeplitz 
structured matrices in sparse error correction. In particular, we 
calculated the thresholds on the sparsity k such that an error 
vector with no more than k nonzero elements can be recovered 
using ( II. 2b . 

There is a well known duality between compressed sensing 



(see ifTOl . ifTTI ') and sparse error detection QIH: the null space 
of the underdetermined matrix in compressed sensing is in 
some sense equivalent to the tall matrix H here in sparse 
error corrections. Toeplitz and circulant matrices have been 
studied in compressed sensing in several papers fT4l fTSl |T6l . 
In these papers, it has been shown that Toeplitz matrices 
are good for recovering sparse vectors from undersampled 
measurements. In contrast, in our model of sparse error 
correction, the signal itself is not sparse and the linear system 
involved is overdetermined rather underdetermined. Also, the 
null space of a Toeplitz matrix does not necessarily correspond 
to another Toeplitz matrix; so the problem studied in this paper 
is essentially different form those studied in llT4l ifTSl llT6l . 

The rest of this paper is organized as follows. In Section 
mi and Section |III1 we derived the strong threshold on k such 
that all sparse vector with no more than k nonzero elements 
can be recovered with high probability. In Section IIVI we 
calculate the weak thresholds on k such that a fixed sparse 
vector is recovered with high probability. In Sections |V] and 
IVII we provide the numerical results and conclude our paper 
by discussing extensions and future directions. 

II. Strong Thresholds 

Now we consider the thresholds on k such that the system 
state can be correctly estimated from all sparse vectors with 
no more than k nonzero elements. Our derivation is based 
on the following now-well-known theorem about the subspace 
spanned by H (see ll20l . for example). 

Theorem 2.1: £i recovery works if and only if every vector 
w in range of the matrix H satisfies ||wif||i < ||w;^||i for 
any subset K C {1,2, ...,n} with cardinality \K\ < k, where 
k is an integer, and K ~ {1, 2, n} \ K. 

To simplify our analysis, we assume that h — 
{h-m+2, /i-m+1, hn) is a vcctor of i.i.d. Gaussian random 
variables following a distribution of A^(0, 1). So we only need 
to look at for what k, this condition is satisfied with high 
probability. The difficulty lies in proving the condition for 
every vector in the subspace generated by H; and the elements 
of H are not independent random variables. For our proof, 
we adopt the following strategy of discretizing the subspaces 
generated by H. 

1. We consider the vectors Hz, where z e i?'" and ||z||2 = 

1. 

2. Cover a finite 7-net set of V = {zi, ...,zi\}} points on 
the unit m-dimensional Euclidean sphere such that for any 
point z on the unit Euchdean sphere, there is an I such that 
\\z - zi\\ < 7 > 0. 

3. Establish the concentration of measure phenomenon for 
the discrete 7-net. 

4. Use the concentration of measure phenomenon for the 
7-net to prove the concentration of measure for all the unit- 
normed vectors z on the unit Euclidean sphere in i?™. 

In Section Uni for a z from the unit Euclidean sphere, we 
have the concentration of ||iJz||i. 

Lemma 2.2: Let ||z||2 = 1. For any e > 0, there exists 
a constant ci > such that when n is large enough, with 



probability 1 - 2e '^ic^+^-d™^ it holds that (1 - e)S < 
\\Hz\\i < (1 + 6)5*, where S ^ nE{\X\} and X is a random 
variable following the Gaussian distribution A^(0, 1). 

Also in Section [III] for a z from the unit Euclidean sphere, 
we have the concentration of the following property, which is 
similar to the condition in Theorem 12. II 

Lemma 2.3: Let ||z||2 = 1 and < (5 < 1 be a constant. 
Then there exists a threshold (3 and a constant C2 (depending 
on m and (3 ), such that, with a probability 1 — e^"^^", for all 
subsets K (- {1,2, ...,n} with cardinality < /3, 

mz)K\\i<l^\\Hz\u. 

The above two lemmas indicate that with overwhelming 
probability the recovery condition in Theorem 12.11 holds for 
the discrete points on 7-net. The following lemma extends the 
result to all the points z on the unit Euclidean sphere in i?". 

Lemma 2.4: There exist a constant C3 > such that when 
n is large enough, with probability 1 — e^*^^", the described 
Toeplitz structured matrices have the following property: for 
every z e R'" and every subset K C {l,...,n} with \K\ < 
I3n, \{Hz),\ - J2 \{Hz),\ > S'S, where S' is a constant. 

Proof: For any given 7 > 0, there exists a 7-net V of 
cai'dinality less than (1 + A 7-net if is a set of 

points such that ||w'''||2 = 1 for all v'' in V and for any z with 
||z||2 = 1, there exists some v'^ such that ||z — f'^||2 < 7- 

Since each row of H has m i.i.d iV(0, 1) entries, each 
element of Hv'' is an (not independent) N{0, 1) entries. 
Applying a union bound to Lemma 12.31 and 12.21 we know 
that for some S > and for every e > 0, with probability 
1 — 2e^^" for some c > 0, 

mv^)K\h<^-^^^^s (iLi) 

and 

il-e)S <\\Hv''\\i<{l + e)S (11.2) 

hold for a vector v'' in V. When n is large enough, from union 
bound we get that dll.ll i and (|II.2t hold for all the points in V 
at the same time with probability at least 1 — e^"^^" for some 
C3 > 0. 

For any z such that ||z||2 = 1, there exists vq in K such that 
||-2^''^o||2 — 7i < 7- Let zi denote z — vq, then H^i— 7ii'i||2 — 
72 < 7i7 < 7^ for some vi in K. Repeating this process, we 
have 

i>o 

where 70 — 1, jj < 7-' and vj E V. 

Thus for any z E R", we have z = lk||2 X)j>o "^j^i- 
For any index set K with li^l < f3n. 



ieK ieT j>0 

ieK j>0 



< S\\zh 



j>0 ieK 

il-5){l + e) 



(2-<5)(l-7) 



i j>0 



> 



> 



\2j2{\iHvo).\-Y.j,\{HvM 



|z|b(5]|(i/^;o).|-E^'El(^^^)»l) 



> \\zUil~e)S-J2-ni + e)S) 

j>i 

> S\lzUl-e-2^). 



1-7 



Thus E \{Hz),\-j:\{Hz),\>S\\zUl-e-2^- 
ieK': ieK 
2 (2-aj(i^7) )• ^ gi'vsn 5, we can pick 7 and e small enough 
such that ''y. \{HzU - E \{Hz),\ > 6'S\\z\\2. ■ 

iGT<= ieT 

We can now establish one main result regarding the thresh- 
old of successful recovery with -minimization. 

Theorem 2.5: There exists a constant C4 > and a constant 
/3 > such that when n is large enough, with probability 
1 — e~^*", an n X m Toeplitz matrix H has the following 
property: for every x e R™ and every error e with its support 
K satisfying \K\ < (3n, x is the unique solution to the £1- 
minimization problem. 

Proof: Lemma 12.41 indicates that J^ieK" \ ^ 
SieA' — '^'*S'||^||2 > for every non-zero z, then from 

Theorem 12. II x is the unique solution to the £1 -minimization 
problem. ■ 

III. Proof of Concentration of Measure in Lemma 
|22]and Lemma I2T3] 

In this section, we prove Lemma 12.21 and Lemma 12.31 We 
will use the concentration of measure inequalities and the 
Chernoff bounds for Gaussian random variables |17 | . 

Proposition 3.1: (Gaussian concentration inequality for 
Lipschitz functions) Let f : R"^ ^ R he a function which 
is Lipschitz with constant 1 (i.e. for all a E R"^ and b e i?'', 
|/(a) — f{d)\ < \\a — &II2 Then for any t, we have 

P{\f{x)^E{f{x)}\>t)<2e-'^, 

where X is a vector of d i.i.d. standard Gaussian random 
variables N{0, 1). 

Proof of Lemma \Z2\ We show, for any ||z||2 = 1, the 
function f{h) = is a function of Lipschitz constant 



■iym{n + m — 1), where 

h = (^-m+2j ^-rn+l, ^n) 



For two vectors hi and ft,2, by the triangular inequality for 
£1 norm and Cauchy inequality, 

n 

< E X llziii 

z— — m+2 

< Vn + m — l\\hi — h2\\2Vrn\\z\\2 
= y^{n + m- l)m\\hi - /12II2, 

where ||z||2 = 1. 

Then a direct application of the Gaussian concentration 
inequality leads us to 12.21 ■ 



Proof of Lemma 12.31 

Note that for a vector z from the unit Euclidean norm in 
i?™, we have 

m 

Ei(^^)«i<EEi^^^-ii^^-i' 

ieK ieK j = l 

and by the Cauchy-Schwarz inequality, 

m 

EEi^^^^ii^^i^E^^i^^-i' 

teK 3=1 jeJ 

where hj is an element of the set h and J C {— m + 2, n} 
is the set of indices j such that hj is included in iJ^- So the 
cardinality | J| < mk. 

By the property of Toeplitz matrices, the number of rows of 
H that involve only hj's with j coming from {— m+2,...,n}\ 
J is at least n — k x {2m — 1). For a fixed vector z, there 
exists a set / C {1,2, ...,n} \ K with cardinality at least 

n — k X (2m — 1) 
2to- 1 ' 

such that {Hz)i, i E I, are independent A^(0, 1) Gaussian 
variables; moreover, these {Hz)i, i E I, are independent from 
those /ij's with j €E J. 

Thus for a fixed support set K, the probability that 

\\{Hz)k\\i>^^\\Hz\\i. 
is smaller than the probability that 

mk 



-kx{2in-l) 
2m-l 



where (without a little abuse of notations) /i^'s and hj's are 
all i.i.d. N{Q, 1) Gaussian random variables. 



Now we use the Chernoff bound, 



-fcx(2m-l) 



1=1 j=l 



/i>0 



mm 



mm 



to I J 

mA; 



n ^ 



Ti-fcx(2m-l) 



Let A: = /3n and because log((^))/n H{f3) as n oo, 
where H{l3) = /3 log(l//3) + (1 - ^) log( So as long for 
a certain ^ > 0, 



+ ( 



HiP) + m/3 X [log(2) 
1 



^+log(F(MV^))] 



2to- 1 



-/3)[log(2) 



i/i2(l-5)2+log(l-F(Ml-5)))]<0, 



then /? is within the correctable error region for all the support 
sets K with high probability. 

We notice that the last quantity can always be made smaller 
than if we take (3 small enough. ■ 



Now we have 



E 



|gM(V^I'i,|)| 



e 2 



e * ^2^' dx 



V2tt 

where F{t) is the cumulative density function for a standard 
Gaussian random variable N{0,1). 

Similarly, we have 

'|e-A'(i-'5)l'»d| 

dx 



E- 



= 2e- 



2 

/2^ 



/m(1-<5) 

^(i-^^(Mi-<5))). 



Putting this back into the Chernoff bound, 

Ti-fcx(2m-l) 
mk 2m -1 

log(F(V^5]|/i.|>(l-5) 5] \h,\)) 



< mfc(log(2) 



mil 



+ log(F(Ai^))) 



^-fc)[log(2) + -^^(l-5)^ 
+ log(l-F0i(l-5)))]. 



IV. Weak Thresholds 

Theorem 4.1: For any /3 < 1, when n ^ co, the system 
state can be recovered perfectly using ti minimization from 
an error vector with fin sparse errors with high probabihty. 

Proof: Without loss of generality, we assume that the 
support set of the error vector e is K and all the nonzero 
elements in the vector e are positive. Then from triangular 
inequality, x can be recovered perfectly if and only if for very 
vector z € i?™. 



j2{Hzh<mz)T^\u, 



where K is the complement of the set K. 
Let us define the function 



M) = ^(iJz),-||(i/z)^||i. 



Similar to previous arguments, for a vector ||z|j2 = 1, the 
function f{h) has a Lipschitz constant of a/ m(n + m — 1). 
So by the Gaussian concentration inequality for the Lipschitz 
functions. 



Pi\fih)~E{fih)}\>t)<2e 



And, 



E{fih)} 



= -{n-\K\)E\X\, 



Fill 



Since there are (^) possible support sets K, the probability where X is a random variable following the standard Gaussian 

distribution N{Q, 1). 



P that v^ES \hi\ > 1^*1 is violated for at 

least one support set K is upper bounded by 

,2 



i-fcx(2m-l) 



log(P) < log( 

+ ( 



2m- 1 



) + mfc(log(2) 
-fc)[log(2) + 



TO/i 



+ log{F{fiVm))) 



So with a probability less than 2e ,r'-'m{n+m-i) 
\f{h)-E{fih)}\<e{n-\K\)E\X\. 



1 



l,'{l~Sr+log{l-Fiil-d)u))]. 



From Lemma lZ2l and similar 7-net arguments as in proving 
the strong thresholds, with high probability over the distri- 
bution of h, f{h) is negative simultaneously for every unit- 
normed z in i?'". Thus we have proven this theorem. ■ 




til 



Fig. 1: Strong Thresholds versus m 



V. Numerical Evaluations 

In this section, based on the derivations in the proof of 
Lemma 12.31 we calculate the strong thresholds for different 
values of m. As we can see, as m increases, the correlation 
length in the matrix H also increases and the corresponding 
strong threshold decreases. Note that for all m, the weak 
threshold is always 1. 

VI. Conclusion 

In this paper, we studied performing system identification 
under sparse gross errors by using £i minimization. In this 
problem, system parameters are observed through a Toeplitz 
matrix and some of the observations are corrupted with gross 
errors. We reduce this problem of system identification to a 
sparse error correcting problem with a Toeplitiz structured 
real-numbered coding matrix. We showed the performance 
guarantee of Toeplitz structured matrix in sparse error cor- 
rection. Thresholds on the percentage of correctable errors are 
also established as a function of the system memory length. 
One interesting future work is to investigate the thresholds 
when the system parameter m is growing proportionally with 
71. It is even interesting just to see whether strong thresholds 
exist in such a scenario for Toeplitz matrices. 
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